Sprint 3 Week 9 Plan

EPGOAT Documentation - Work In Progress

Sprint 3 Week 9 Plan: Medium File Refactoring (Services Layer)

Date: 2025-11-05 Last Updated: 2025-11-09 Sprint: Sprint 3 - Medium File Refactoring Week: Week 9 (Batch 3A: Services) Duration: 1 week Priority: 🟡 P2 - Medium Technical Debt


Executive Summary

Sprint 3 Week 9 focuses on refactoring 8 service files (300-500 lines) by extracting long functions (>50 lines) and adding error handling. Unlike Sprint 2's file splits, this sprint uses function extraction to break down complex methods while keeping files intact.

Goal: Eliminate all functions >50 lines in Batch 3A service files

Approach: - Extract helper methods from long functions - Add try/except blocks for risky operations - Improve logging for debugging - Maintain 100% backward compatibility

Files: 8 services files (3,082 lines total)


Sprint 3 Overview

Sprint 3 Structure

Sprint 3 Week 9 (Batch 3A: Services) - 8 service files - Focus: Extract long functions, add error handling - Pattern: Function extraction

Sprint 3 Week 10 (Batch 3B: Data & Database) - 7 data/database/client files - Same approach: Function extraction + error handling

Total Sprint 3: 15 files (~5,500 lines)


Batch 3A: Services Files (Week 9)

Files Summary

# File Lines Long Functions Longest Approach
3.1 family_league_inference.py 434 3 78L Extract inference helpers
3.2 logo_generator.py 322 1 99L Extract image processing steps
3.3 match_debug_logger.py 459 1 181L! Extract Excel sheet writers
3.4 match_suggestions.py 382 1 56L Extract similarity calculation helpers
3.5 provider_config_manager.py 474 3 119L! Extract cache/DB helpers
3.6 provider_orchestrator.py 394 1 89L Extract provider processing steps
3.7 scoped_team_extractor.py 313 1 94L Extract regex matching helpers
3.8 enhanced_match_cache.py 304 0 42L Add error handling only
Total 8 files 3,082 11 181L Function extraction

Detailed File Analysis

Task 3.1: family_league_inference.py (434 lines)

Current State: 434 lines, 3 long functions

Long Functions Identified: 1. _infer_from_event_context(): 78 lines (317-394) - Event context inference 2. _infer_from_teams(): 74 lines (242-315) - Team-based inference 3. infer_leagues(): 63 lines (111-173) - Main inference coordinator

Refactoring Approach:

For _infer_from_event_context() (78 lines):

# Current: 78-line monolith
def _infer_from_event_context(self, teams, payload):
    # ... 78 lines of event context matching ...

# After: Extract 3 helpers
def _infer_from_event_context(self, teams, payload):
    season_matches = self._extract_season_info(payload)
    tournament_matches = self._extract_tournament_info(payload)
    competition_matches = self._extract_competition_info(payload)
    return self._merge_event_candidates(season_matches, tournament_matches, competition_matches)

def _extract_season_info(self, payload): ...  # 20 lines
def _extract_tournament_info(self, payload): ...  # 20 lines
def _extract_competition_info(self, payload): ...  # 20 lines
def _merge_event_candidates(self, *candidates): ...  # 15 lines

For _infer_from_teams() (74 lines):

# Current: 74-line monolith
def _infer_from_teams(self, team1, team2):
    # ... 74 lines of team matching ...

# After: Extract 2 helpers
def _infer_from_teams(self, team1, team2):
    team1_leagues = self._find_team_leagues(team1)
    team2_leagues = self._find_team_leagues(team2)
    return self._intersect_league_candidates(team1_leagues, team2_leagues)

def _find_team_leagues(self, team_name): ...  # 30 lines
def _intersect_league_candidates(self, leagues1, leagues2): ...  # 25 lines

For infer_leagues() (63 lines):

# Current: 63-line coordinator
def infer_leagues(self, family, team1, team2, payload, provider):
    # ... 63 lines of orchestration ...

# After: Extract validation helper
def infer_leagues(self, family, team1, team2, payload, provider):
    if not self._validate_inference_inputs(family, team1, team2):
        return []
    # ... rest of logic (now <50 lines) ...

def _validate_inference_inputs(self, family, team1, team2): ...  # 15 lines

Estimated Time: 3 hours - Analyze and extract helpers: 1.5 hours - Test and verify: 1 hour - Add error handling: 30 min


Task 3.2: logo_generator.py (322 lines)

Current State: 322 lines, 1 long function

Long Function Identified: 1. generate_split_logo(): 99 lines (179-277) - Creates split diagonal team logos

Refactoring Approach:

For generate_split_logo() (99 lines):

# Current: 99-line monolith
def generate_split_logo(self, team1_info, team2_info, ...):
    # ... 99 lines of image processing ...

# After: Extract image processing steps
def generate_split_logo(self, team1_info, team2_info, ...):
    # Get cached or generate
    cached = self._get_cached_logo(cache_key)
    if cached:
        return cached

    # Generate new split logo
    canvas = self._create_canvas()
    logo1_img = self._load_team_logo(team1_info)
    logo2_img = self._load_team_logo(team2_info)
    mask = self._create_diagonal_mask(canvas.size)
    result = self._composite_split_logos(canvas, logo1_img, logo2_img, mask)

    return self._save_and_return(result, cache_key)

def _create_canvas(self): ...  # 10 lines
def _load_team_logo(self, team_info): ...  # 20 lines
def _composite_split_logos(self, canvas, logo1, logo2, mask): ...  # 25 lines
def _save_and_return(self, image, cache_key): ...  # 15 lines

Error Handling to Add: - Wrap _download_image() with try/except (network failures) - Handle PIL image processing errors - Log failures with team info for debugging

Estimated Time: 2 hours - Extract image processing helpers: 1 hour - Add error handling: 30 min - Test with sample logos: 30 min


Task 3.3: match_debug_logger.py (459 lines)

Current State: 459 lines, 1 extremely long function

Long Function Identified: 1. _export_excel(): 181 lines (278-458) - Excel export with multiple sheets

Refactoring Approach:

For _export_excel() (181 lines) - Similar to Task 2.9 (analyze_mismatches.py):

# Current: 181-line monolith
def _export_excel(self):
    # ... 181 lines creating 4-5 Excel sheets ...

# After: Extract sheet writers
def _export_excel(self):
    wb = Workbook()
    self._write_summary_sheet(wb)
    self._write_channels_sheet(wb)
    self._write_cache_attempts_sheet(wb)
    self._write_db_queries_sheet(wb)
    self._write_api_calls_sheet(wb)
    wb.save(self.excel_path)

def _write_summary_sheet(self, wb): ...  # 30 lines
def _write_channels_sheet(self, wb): ...  # 35 lines
def _write_cache_attempts_sheet(self, wb): ...  # 30 lines
def _write_db_queries_sheet(self, wb): ...  # 30 lines
def _write_api_calls_sheet(self, wb): ...  # 35 lines

Pattern: Same as analyze_mismatches.py Task 2.9 (excel_exporter.py)

Estimated Time: 2 hours - Analyze Excel structure: 30 min - Extract 5 sheet writers: 1 hour - Test Excel output: 30 min


Task 3.4: match_suggestions.py (382 lines)

Current State: 382 lines, 1 long function

Long Function Identified: 1. calculate_similarity(): 56 lines (250-305) - Multi-factor similarity calculation

Refactoring Approach:

For calculate_similarity() (56 lines):

# Current: 56-line monolith
def calculate_similarity(self, unmatched, event):
    # ... 56 lines of similarity scoring ...

# After: Extract scoring components
def calculate_similarity(self, unmatched, event):
    team_score = self._calculate_team_similarity(unmatched, event)
    date_score = self._calculate_date_similarity(unmatched, event)
    time_score = self._calculate_time_similarity(unmatched, event)
    league_score = self._calculate_league_similarity(unmatched, event)

    total_score = (team_score * 0.5 + date_score * 0.3 +
                   time_score * 0.1 + league_score * 0.1)

    return min(total_score, 100.0)

def _calculate_team_similarity(self, unmatched, event): ...  # 15 lines
def _calculate_date_similarity(self, unmatched, event): ...  # 10 lines
def _calculate_time_similarity(self, unmatched, event): ...  # 10 lines
def _calculate_league_similarity(self, unmatched, event): ...  # 10 lines

Benefits: - Each similarity component independently testable - Easy to adjust weights (currently 50/30/10/10) - Clear separation of concerns

Estimated Time: 1.5 hours - Extract 4 similarity helpers: 1 hour - Test similarity scoring: 30 min


Task 3.5: provider_config_manager.py (474 lines)

Current State: 474 lines, 3 long functions

Long Functions Identified: 1. _fetch_from_db(): 119 lines (256-374) - Fetch config from D1 2. _load_from_cache(): 96 lines (159-254) - Load config from YAML cache 3. _save_to_cache(): 77 lines (376-452) - Save config to YAML cache

Refactoring Approach:

For _fetch_from_db() (119 lines):

# Current: 119-line monolith
def _fetch_from_db(self, provider_id):
    # ... 119 lines fetching provider/channels/overrides ...

# After: Extract fetch operations
def _fetch_from_db(self, provider_id):
    provider = self._fetch_provider_record(provider_id)
    channels = self._fetch_provider_channels(provider_id)
    overrides = self._fetch_provider_overrides(provider_id)
    return self._assemble_provider_config(provider, channels, overrides)

def _fetch_provider_record(self, provider_id): ...  # 25 lines
def _fetch_provider_channels(self, provider_id): ...  # 30 lines
def _fetch_provider_overrides(self, provider_id): ...  # 25 lines
def _assemble_provider_config(self, provider, channels, overrides): ...  # 30 lines

For _load_from_cache() (96 lines):

# Current: 96-line monolith
def _load_from_cache(self, provider_id):
    # ... 96 lines loading YAML cache ...

# After: Extract load operations
def _load_from_cache(self, provider_id):
    cache_file = self._get_cache_file_path(provider_id)
    if not cache_file.exists():
        return None

    yaml_data = self._read_yaml_cache(cache_file)
    return self._parse_cached_config(yaml_data)

def _get_cache_file_path(self, provider_id): ...  # 10 lines
def _read_yaml_cache(self, cache_file): ...  # 15 lines
def _parse_cached_config(self, yaml_data): ...  # 40 lines

For _save_to_cache() (77 lines):

# Current: 77-line monolith
def _save_to_cache(self, provider_id, config):
    # ... 77 lines saving YAML cache ...

# After: Extract save operations
def _save_to_cache(self, provider_id, config):
    cache_file = self._get_cache_file_path(provider_id)
    yaml_data = self._serialize_config_to_yaml(config)
    self._write_yaml_cache(cache_file, yaml_data)

def _serialize_config_to_yaml(self, config): ...  # 35 lines
def _write_yaml_cache(self, cache_file, yaml_data): ...  # 20 lines

Error Handling to Add: - Wrap D1 queries with try/except (database errors) - Handle YAML parsing errors - Handle file I/O errors - Log failures with provider ID

Estimated Time: 3.5 hours - Extract DB fetch helpers: 1.5 hours - Extract cache load/save helpers: 1 hour - Add error handling: 30 min - Test with sample provider: 30 min


Task 3.6: provider_orchestrator.py (394 lines)

Current State: 394 lines, 1 long function

Long Function Identified: 1. process_all_providers(): 89 lines (133-221) - Process all providers with retry logic

Refactoring Approach:

For process_all_providers() (89 lines):

# Current: 89-line monolith
def process_all_providers(self, date, ...):
    # ... 89 lines of provider processing ...

# After: Extract processing steps
def process_all_providers(self, date, ...):
    providers = self._get_active_providers_for_processing()
    results = []

    with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
        futures = self._submit_provider_jobs(executor, providers, date, ...)
        results = self._collect_provider_results(futures)

    return self._summarize_processing_results(results)

def _get_active_providers_for_processing(self): ...  # 15 lines
def _submit_provider_jobs(self, executor, providers, date, ...): ...  # 25 lines
def _collect_provider_results(self, futures): ...  # 30 lines
def _summarize_processing_results(self, results): ...  # 20 lines

Error Handling to Add: - Wrap ThreadPoolExecutor with try/except - Handle provider timeout errors - Log concurrent processing failures

Estimated Time: 2 hours - Extract processing helpers: 1 hour - Add error handling: 30 min - Test with sample providers: 30 min


Task 3.7: scoped_team_extractor.py (313 lines)

Current State: 313 lines, 1 long function

Long Function Identified: 1. extract_team(): 94 lines (206-299) - Multi-scope team extraction

Refactoring Approach:

For extract_team() (94 lines):

# Current: 94-line monolith
def extract_team(self, text, ...):
    # ... 94 lines of scoped regex matching ...

# After: Extract scope matching
def extract_team(self, text, league_hint=None, sport_hint=None, ...):
    # Try league-scoped first
    if league_hint:
        match = self._try_league_scoped_extraction(text, league_hint)
        if match:
            return match

    # Try sport-scoped
    if sport_hint:
        match = self._try_sport_scoped_extraction(text, sport_hint)
        if match:
            return match

    # Fallback to global
    return self._try_global_extraction(text)

def _try_league_scoped_extraction(self, text, league): ...  # 25 lines
def _try_sport_scoped_extraction(self, text, sport): ...  # 25 lines
def _try_global_extraction(self, text): ...  # 30 lines

Estimated Time: 2 hours - Extract scope matching helpers: 1 hour - Test with sample teams: 1 hour


Task 3.8: enhanced_match_cache.py (304 lines)

Current State: 304 lines, NO long functions (longest is 42 lines)

Approach: Add error handling only (no function extraction needed)

Error Handling to Add: 1. store_match() (46 lines): Wrap cache writes with try/except 2. find_match() (42 lines): Handle missing cache keys 3. cleanup_expired() (23 lines): Handle concurrent cleanup 4. All methods: Add logging for debugging

Example:

# Before
def store_match(self, ...):
    self._by_tvg_id[tvg_id] = cached_match
    self._by_channel_name[channel_name] = cached_match

# After
def store_match(self, ...):
    try:
        self._by_tvg_id[tvg_id] = cached_match
        self._by_channel_name[channel_name] = cached_match
        logger.debug(f"Stored match for {channel_name}")
    except Exception as e:
        logger.error(f"Failed to store match: {e}")
        raise

Estimated Time: 1 hour - Add try/except to 4 methods: 30 min - Add logging statements: 15 min - Test cache operations: 15 min


Implementation Strategy

Pattern: Function Extraction

Unlike Sprint 2's file splits, Sprint 3 uses function extraction:

When to Extract: - Function >50 lines - Clear logical sections (e.g., step 1, step 2, step 3) - Repeated code blocks - Complex nested logic

What to Extract: - Processing steps (fetch → parse → save) - Calculation components (team score + date score + ...) - Validation logic - Error handling blocks

What NOT to Extract: - Simple loops (<10 lines) - Single-purpose blocks already clear - Coordinator logic that ties steps together

ROI-Based Decisions

Skip extraction if: - Function is 50-60 lines but already clear - Extraction would create more complexity than it removes - Function is a coordinator that legitimately ties many steps

Enhanced match cache (Task 3.8) is an example: no long functions, just needs error handling.


Time Estimates

Per-Task Breakdown

Task File Extraction Error Handling Testing Total
3.1 family_league_inference.py 1.5h 0.5h 1h 3h
3.2 logo_generator.py 1h 0.5h 0.5h 2h
3.3 match_debug_logger.py 1h 0h 0.5h 1.5h
3.4 match_suggestions.py 1h 0h 0.5h 1.5h
3.5 provider_config_manager.py 2.5h 0.5h 0.5h 3.5h
3.6 provider_orchestrator.py 1h 0.5h 0.5h 2h
3.7 scoped_team_extractor.py 1h 0h 1h 2h
3.8 enhanced_match_cache.py 0h 0.5h 0.5h 1h
Total 8 files 9h 2.5h 5h 16.5h

Estimated Duration: 2-3 days (with buffer)


Success Criteria

Code Quality

All functions <50 lines - Zero functions exceeding 50 lines ✅ Error handling added - All risky operations wrapped with try/except ✅ Logging improved - Debug/error logging for troubleshooting ✅ All imports passing - No broken imports after refactoring ✅ Backward compatibility - 100% compatible with existing code

Testing

Existing tests passing - All tests continue to pass ✅ Manual testing - Test key workflows with sample data ✅ Import verification - Verify all imports work

Documentation

Completion reports - Create task completion .md for each file ✅ Code comments - Add docstrings to extracted helpers


Risk Mitigation

Medium-Risk Areas

  1. logo_generator.py: PIL image processing can fail in unexpected ways
  2. Mitigation: Comprehensive error handling + test with real logos

  3. provider_config_manager.py: Supabase database + YAML caching is complex

  4. Mitigation: Test with staging Supabase database first

  5. provider_orchestrator.py: ThreadPoolExecutor concurrency issues

  6. Mitigation: Add timeout handling + test with multiple providers

Low-Risk Areas

  • family_league_inference.py: Pure logic, no external dependencies
  • match_suggestions.py: Simple similarity calculations
  • scoped_team_extractor.py: Regex matching (well-tested)
  • enhanced_match_cache.py: In-memory cache (simple)

Dependencies

No external blockers - All work is internal refactoring

Internal dependencies: - Sprint 2 completion (✅ Done) - Staging Supabase database access (for Task 3.5 testing) - Sample provider data (for Task 3.6 testing)


Next Steps

Week 9 Execution

  1. Day 1: Tasks 3.1, 3.2 (5 hours)
  2. Day 2: Tasks 3.3, 3.4, 3.8 (4 hours)
  3. Day 3: Tasks 3.5, 3.6, 3.7 (7.5 hours)

Total: ~16.5 hours over 3 days

Week 10 Preview

After Week 9 completion, proceed to Sprint 3 Week 10 (Batch 3B): - enhanced_event_matcher.py (363L) - enhanced_team_matcher.py (460L) - database/connection.py (369L) - database/migration_runner.py (386L) - parsers/provider_m3u_parser.py (370L) - clients/espn_api_client.py (396L) - clients/tv_schedule_client.py (461L)

Total Batch 3B: 7 files (~2,800 lines)


Appendix

Long Functions Summary

By Severity: - Critical (>100 lines): 3 functions - match_debug_logger._export_excel(): 181 lines - provider_config_manager._fetch_from_db(): 119 lines - logo_generator.generate_split_logo(): 99 lines

  • High (75-100 lines): 3 functions
  • provider_orchestrator.process_all_providers(): 89 lines
  • scoped_team_extractor.extract_team(): 94 lines
  • provider_config_manager._load_from_cache(): 96 lines

  • Medium (50-75 lines): 5 functions

  • family_league_inference._infer_from_event_context(): 78 lines
  • family_league_inference._infer_from_teams(): 74 lines
  • provider_config_manager._save_to_cache(): 77 lines
  • family_league_inference.infer_leagues(): 63 lines
  • match_suggestions.calculate_similarity(): 56 lines

Total: 11 long functions across 7 files


Plan Version: 1.0 Created: 2025-11-05 Sprint 2 Status: ✅ Complete Sprint 3 Week 9 Status: 📋 Ready for Execution Next Action: Begin Task 3.1 (family_league_inference.py)